Data Source:
Kaggle: Dog Breed Dataset (https://www.kaggle.com/kingburrito666/largest-dog-breed-data-set)
This dataset contains 10 years (from 2007 to 2017) of dog licenses record, including columns like LicenseType, Breed, Color, DogName, OwnerZip, ExpYear, ValidDate. I’ve used the datasets from 2013 to 2017 for visualizations.
Kaggle: Dog Characteristics (https://www.kaggle.com/rturley/pet-breed-characteristics?select=dog_breed_characteristics.csv)
This dataset contains 200+ dog breeds and their groupings, as well as other useful information about dog. The main info I used in this dataset are the breed name and the group of each breed.
Objectives of Visualizaton:
My goal for the visualization is to let people know more about DOGS. As the cutest creature in the world, they deserve the best care and love from humans. From my visualization, you’ll know more about dog breeds, dog groupings, as well as dog naming trend in recent years! Also, I will introduce if gender plays an important role in adoption/buying by human.
Software and package versions used:
Software:
Libraries:
From this plot, we can tell that BLACK, BROWN and WHITE are the most popular dog colors in this region. Mix colors like BLACK AND BROWN or WHITE AND BLACK or SPOTTED are also very popular.
There’s still a fair amount of dogs have not get neutered or spayed, which is not an ideal situation for domesticated dogs. But it is very possible that those not neutered dogs are breeding dogs.
This plot not only tell us what popular female/male dog names are, it also tells us those popular names are becoming less popular day by day while new dog names are gaining popularity.
This plot not only tell us what popular female/male dog names are, it also tells us those popular names are becoming less popular day by day while new dog names are gaining popularity.
From this plot we can tell that dogs are generally grouped into 7 groups, and we can tell the group of some dog breeds from their name, like “rat terrier” and “welsh terrier” belong to the TERRIER group.
From this map, we can tell that this dataset mainly records dog license type for the Allegheny county, some a tiny bit other nearby counties, which is not informed by the dataset description. We can also tell that the total number of dog license has a little bit decrease in 2017, which is kind of uncommon.
---
title: "Final Project"
output:
flexdashboard::flex_dashboard:
orientation: columns
source_code: embed
vertical_layout: scroll
---
About
=====================================
**Data Source**:
Kaggle: Dog Breed Dataset (https://www.kaggle.com/kingburrito666/largest-dog-breed-data-set)
This dataset contains 10 years (from 2007 to 2017) of dog licenses record, including columns like LicenseType, Breed, Color, DogName, OwnerZip, ExpYear, ValidDate. I've used the datasets from 2013 to 2017 for visualizations.
Kaggle: Dog Characteristics (https://www.kaggle.com/rturley/pet-breed-characteristics?select=dog_breed_characteristics.csv)
This dataset contains 200+ dog breeds and their groupings, as well as other useful information about dog. The main info I used in this dataset are the breed name and the group of each breed.
**Objectives of Visualizaton**:
My goal for the visualization is to let people know more about **DOGS**. As the cutest creature in the world, they deserve the best care and love from humans. From my visualization, you'll know more about dog breeds, dog groupings, as well as dog naming trend in recent years! Also, I will introduce if gender plays an important role in adoption/buying by human.
**Software and package versions used**:
Software:
- RStudio
Libraries:
- flexdashboard
- ggplot2
- hrbrthemes
- dplyr
- tidyr
- viridis
- lubridate
- ggpubr
- igraph
- stringr
- maps
- sf
- tigris
Visualizations
=====================================
```{r setup, include=FALSE}
library(flexdashboard)
library(ggplot2)
library(hrbrthemes)
library(dplyr)
library(tidyr)
library(viridis)
library(lubridate)
library(ggpubr)
library(igraph)
library(stringr)
library(maps, warn.conflicts = F, quietly = T)
library(sf)
library(tigris)
options(tigris_use_cache = TRUE)
```
Column {data-width=450}
-------------------------------------
### Exploratory
```{r}
#read in the datasets
dog_license_2013 <- read.csv("data/2013.csv")
dog_license_2014 <- read.csv("data/2014.csv")
dog_license_2015 <- read.csv("data/2015.csv")
dog_license_2016 <- read.csv("data/2016.csv")
dog_license_2017 <- read.csv("data/2017.csv")
```
```{r}
#remove the null values
dog_license_2013 <- na.omit(dog_license_2013)
dog_license_2014 <- na.omit(dog_license_2014)
dog_license_2015 <- na.omit(dog_license_2015)
dog_license_2016 <- na.omit(dog_license_2016)
dog_license_2017 <- na.omit(dog_license_2017)
```
```{r}
#merge 3 datasets into one
data_13_14 <- rbind(dog_license_2013, dog_license_2014)
data_15_16 <- rbind(dog_license_2015, dog_license_2016)
data <- rbind(data_13_14, data_15_16)
data <- rbind(data, dog_license_2017 )
```
```{r}
##extract the Gender from the License Type column
data$Gender = 0*nrow(data)
for(i in 1:nrow(data)){
if(grepl("Female", data$LicenseType[i])){
data$Gender[i] = "Female"
}else if(grepl("Male", data$LicenseType[i])) {
data$Gender[i] = "Male"
}else{
data$Gender[i] = "Not recorded"
}
}
data <- data %>% na_if("") %>% na.omit
```
```{r}
#extract the Senior Citizen or Disability condition from the License Type column
data$SeniorCitizenOrDisability= 0*nrow(data)
for(i in 1:nrow(data)){
if(grepl("Senior Citizen or Disability", data$LicenseType[i])){
data$SeniorCitizenOrDisability[i] = "Yes"
}else{
data$SeniorCitizenOrDisability[i] = "No"
}
}
```
```{r}
#extract the Neutered or Spayed condition from the License Type column
data$NeuteredOrSpayed= 0*nrow(data)
for(i in 1:nrow(data)){
if(grepl("Neutered", data$LicenseType[i]) || grepl("Spayed", data$LicenseType[i])){
data$NeuteredOrSpayed[i] = "Yes"
}else{
data$NeuteredOrSpayed[i] = "No"
}
}
```
```{r}
#get the unique colors and their counts
color_table <- table(data$Color)
color_table <- as.data.frame(color_table)
color_table <- color_table %>%
rename(
count = Freq,
color = Var1
)
color_table <- color_table[order(-color_table$count),]
color_table <- color_table[1:10,]
```
```{r}
p2 <- ggplot(color_table, aes(x=color, y=count, fill=color)) + geom_bar(stat="identity") + scale_fill_manual(values=c("#e3c3da", "#9bc4cc", "#56B4E9","#b1c7a9", "#e8acbb", "#f7cb5c", "#a48dc2","#edbaaf","#f2e791","#a87489"))+ggtitle("Top 10 Most Common Colors of Dogs from 2013 ~ 2017") + theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1), plot.title = element_text(size = 14, face = "bold"), axis.title.x = element_text(size = 14, face="bold"), axis.title.y = element_text(size = 14, face="bold")) + xlab("Dog color") + ylab("Count") + theme(plot.margin = margin(0,0,0,0, "cm"))
p2
```
### Analysis
From this plot, we can tell that BLACK, BROWN and WHITE are the most popular dog colors in this region. Mix colors like BLACK AND BROWN or WHITE AND BLACK or SPOTTED are also very popular.
### Exploratory
```{r}
g <- ggplot(data, aes(NeuteredOrSpayed))+ geom_bar(aes(fill=Gender),width = 0.4) +scale_fill_manual(values=c("#e3c3da", "#a48dc2", "#9bc4cc")) + ggtitle("Histogram of Dog Neuter/Spay Condition Based on Gender in 2013 ~ 2017") + xlab("Neutered/Spayed Or Not") + ylab("Count")+theme(plot.title = element_text(size = 14, face = "bold"), axis.title.x = element_text(size = 14, face="bold"), axis.title.y = element_text(size = 14, face="bold")) + theme(plot.margin = margin(0,0,0,0, "cm"))
g
```
### Analysis
There's still a fair amount of dogs have not get neutered or spayed, which is not an ideal situation for domesticated dogs. But it is very possible that those not neutered dogs are breeding dogs.
### Time Series
```{r}
data_SeniorCitizenOrDisability <- data[data$SeniorCitizenOrDisability == "Yes",]
data_2013 <- data_SeniorCitizenOrDisability[data_SeniorCitizenOrDisability$ExpYear == "2013",]
data_2014 <- data_SeniorCitizenOrDisability[data_SeniorCitizenOrDisability$ExpYear == "2014",]
data_2015 <- data_SeniorCitizenOrDisability[data_SeniorCitizenOrDisability$ExpYear == "2015",]
data_2016 <- data_SeniorCitizenOrDisability[data_SeniorCitizenOrDisability$ExpYear == "2016",]
data_2017 <- data_SeniorCitizenOrDisability[data_SeniorCitizenOrDisability$ExpYear == "2017",]
```
```{r}
SeniorCitizenOrDisability_gender_2013 <- data_2013 %>% count(Gender, sort = TRUE)
SeniorCitizenOrDisability_gender_2013$Year <- "2013"
SeniorCitizenOrDisability_gender_2014 <- data_2014 %>% count(Gender, sort = TRUE)
SeniorCitizenOrDisability_gender_2014$Year <- "2014"
SeniorCitizenOrDisability_gender_2015 <- data_2015 %>% count(Gender, sort = TRUE)
SeniorCitizenOrDisability_gender_2015$Year <- "2015"
SeniorCitizenOrDisability_gender_2016 <- data_2016 %>% count(Gender, sort = TRUE)
SeniorCitizenOrDisability_gender_2016$Year <- "2016"
SeniorCitizenOrDisability_gender_2017 <- data_2017 %>% count(Gender, sort = TRUE)
SeniorCitizenOrDisability_gender_2017$Year <- "2017"
SeniorCitizenOrDisability_gender_13_14 <- rbind(SeniorCitizenOrDisability_gender_2013, SeniorCitizenOrDisability_gender_2014)
SeniorCitizenOrDisability_gender_15_16 <- rbind(SeniorCitizenOrDisability_gender_2015, SeniorCitizenOrDisability_gender_2016)
SeniorCitizenOrDisability_gender <- rbind(SeniorCitizenOrDisability_gender_13_14, SeniorCitizenOrDisability_gender_15_16)
SeniorCitizenOrDisability_gender <- rbind(SeniorCitizenOrDisability_gender, SeniorCitizenOrDisability_gender_2017)
SeniorCitizenOrDisability_gender <- as.data.frame(SeniorCitizenOrDisability_gender)
SeniorCitizenOrDisability_gender <- SeniorCitizenOrDisability_gender %>%
rename(
count = n,
)
SeniorCitizenOrDisability_gender$Year <- as.Date(ISOdate(SeniorCitizenOrDisability_gender$Year, 1, 1))
```
```{r}
p2 <- ggplot() + geom_area(aes(y = SeniorCitizenOrDisability_gender$count, x = Year, fill = Gender), data = SeniorCitizenOrDisability_gender,stat="identity")+scale_fill_manual(values=c("#e3c3da", "#a48dc2", "#9bc4cc")) + ggtitle("Gender Composition of Senior Citizens and Disability in 2013 ~ 2017") + xlab("Year") + ylab("Count of Senior Citizen and Disability")+ theme(plot.title = element_text(size = 14, face = "bold"), axis.title.x = element_text(size = 14, face="bold"), axis.title.y = element_text(size = 14, face="bold"))+theme(plot.margin = margin(0,0,0,0, "cm"))
p2
```
### Analysis
This plot not only tell us what popular female/male dog names are, it also tells us those popular names are becoming less popular day by day while new dog names are gaining popularity.
### Text
```{r}
#prepare the dog name count for all gender in each year
dogName_2013 <- data[data$ExpYear == "2013",]
dogName_2013 <- table(dogName_2013$DogName)
dogName_2013 <- as.data.frame(dogName_2013)
dogName_2013 <- dogName_2013 %>% rename(count = Freq, name = Var1)
dogName_2013 <- dogName_2013[order(-dogName_2013$count),]
dogName_2013 <- dogName_2013[1:5,]
dogName_2013$year <- "2013"
dogName_2014 <- data[data$ExpYear == "2014",]
dogName_2014 <- table(dogName_2014$DogName)
dogName_2014 <- as.data.frame(dogName_2014)
dogName_2014 <- dogName_2014 %>% rename(count = Freq, name = Var1)
dogName_2014 <- dogName_2014[order(-dogName_2014$count),]
dogName_2014 <- dogName_2014[1:5,]
dogName_2014$year <- "2014"
dogName_2015 <- data[data$ExpYear == "2015",]
dogName_2015 <- table(dogName_2015$DogName)
dogName_2015 <- as.data.frame(dogName_2015)
dogName_2015 <- dogName_2015 %>% rename(count = Freq, name = Var1)
dogName_2015 <- dogName_2015[order(-dogName_2015$count),]
dogName_2015 <- dogName_2015[1:5,]
dogName_2015$year <- "2015"
dogName_2016 <- data[data$ExpYear == "2016",]
dogName_2016 <- table(dogName_2016$DogName)
dogName_2016 <- as.data.frame(dogName_2016)
dogName_2016 <- dogName_2016 %>% rename(count = Freq, name = Var1)
dogName_2016 <- dogName_2016[order(-dogName_2016$count),]
dogName_2016 <- dogName_2016[1:5,]
dogName_2016$year <- "2016"
dogName_2017 <- data[data$ExpYear == "2017",]
dogName_2017 <- table(dogName_2017$DogName)
dogName_2017 <- as.data.frame(dogName_2017)
dogName_2017 <- dogName_2017 %>% rename(count = Freq, name = Var1)
dogName_2017 <- dogName_2017[order(-dogName_2017$count),]
dogName_2017 <- dogName_2017[1:5,]
dogName_2017$year <- "2017"
dogName_count_13_14 <- rbind(dogName_2013,dogName_2014)
dogName_count_15_16 <- rbind(dogName_2015,dogName_2016)
dogName_count<- rbind(dogName_count_13_14,dogName_count_15_16)
dogName_count <- rbind(dogName_count, dogName_2017)
```
```{r}
#prepare dog name count for female in each year
female_dogname <- data[data$Gender == "Female",]
female_dogName_2013 <- female_dogname[female_dogname$ExpYear == "2013",]
female_dogName_2013 <- table(female_dogName_2013$DogName)
female_dogName_2013 <- as.data.frame(female_dogName_2013)
female_dogName_2013 <- female_dogName_2013 %>% rename(count = Freq, name = Var1)
female_dogName_2013 <- female_dogName_2013[order(-female_dogName_2013$count),]
female_dogName_2013 <- female_dogName_2013[1:5,]
female_dogName_2013$year <- "2013"
female_dogName_2014 <- female_dogname[female_dogname$ExpYear == "2014",]
female_dogName_2014 <- table(female_dogName_2014$DogName)
female_dogName_2014 <- as.data.frame(female_dogName_2014)
female_dogName_2014 <- female_dogName_2014 %>% rename(count = Freq, name = Var1)
female_dogName_2014 <- female_dogName_2014[order(-female_dogName_2014$count),]
female_dogName_2014 <- female_dogName_2014[1:5,]
female_dogName_2014$year <- "2014"
female_dogName_2015 <- female_dogname[female_dogname$ExpYear == "2015",]
female_dogName_2015 <- table(female_dogName_2015$DogName)
female_dogName_2015 <- as.data.frame(female_dogName_2015)
female_dogName_2015 <- female_dogName_2015 %>% rename(count = Freq, name = Var1)
female_dogName_2015 <- female_dogName_2015[order(-female_dogName_2015$count),]
female_dogName_2015 <- female_dogName_2015[1:5,]
female_dogName_2015$year <- "2015"
female_dogName_2016 <- female_dogname[female_dogname$ExpYear == "2016",]
female_dogName_2016 <- table(female_dogName_2016$DogName)
female_dogName_2016 <- as.data.frame(female_dogName_2016)
female_dogName_2016 <- female_dogName_2016 %>% rename(count = Freq, name = Var1)
female_dogName_2016 <- female_dogName_2016[order(-female_dogName_2016$count),]
female_dogName_2016 <- female_dogName_2016[1:5,]
female_dogName_2016$year <- "2016"
female_dogName_2017 <- female_dogname[female_dogname$ExpYear == "2017",]
female_dogName_2017 <- table(female_dogName_2017$DogName)
female_dogName_2017 <- as.data.frame(female_dogName_2017)
female_dogName_2017 <- female_dogName_2017 %>% rename(count = Freq, name = Var1)
female_dogName_2017 <- female_dogName_2017[order(-female_dogName_2017$count),]
female_dogName_2017 <- female_dogName_2017[1:5,]
female_dogName_2017$year <- "2017"
```
```{r}
female_dogName_count_13_14 <- rbind(female_dogName_2013,female_dogName_2014)
female_dogName_count_15_16 <- rbind(female_dogName_2015,female_dogName_2016)
female_dogName_count<- rbind(female_dogName_count_13_14,female_dogName_count_15_16)
female_dogName_count <- rbind(female_dogName_count, female_dogName_2017)
```
```{r}
#prepare dog name count for male in each year
male_dogname <- data[data$Gender == "Male",]
male_dogName_2013 <- male_dogname[male_dogname$ExpYear == "2013",]
male_dogName_2013 <- table(male_dogName_2013$DogName)
male_dogName_2013 <- as.data.frame(male_dogName_2013)
male_dogName_2013 <- male_dogName_2013 %>% rename(count = Freq, name = Var1)
male_dogName_2013 <- male_dogName_2013[order(-male_dogName_2013$count),]
male_dogName_2013 <- male_dogName_2013[1:5,]
male_dogName_2013$year <- "2013"
male_dogName_2014 <- male_dogname[male_dogname$ExpYear == "2014",]
male_dogName_2014 <- table(male_dogName_2014$DogName)
male_dogName_2014 <- as.data.frame(male_dogName_2014)
male_dogName_2014 <- male_dogName_2014 %>% rename(count = Freq, name = Var1)
male_dogName_2014 <- male_dogName_2014[order(-male_dogName_2014$count),]
male_dogName_2014 <- male_dogName_2014[1:5,]
male_dogName_2014$year <- "2014"
male_dogName_2015 <- male_dogname[male_dogname$ExpYear == "2015",]
male_dogName_2015 <- table(male_dogName_2015$DogName)
male_dogName_2015 <- as.data.frame(male_dogName_2015)
male_dogName_2015 <- male_dogName_2015 %>% rename(count = Freq, name = Var1)
male_dogName_2015 <- male_dogName_2015[order(-male_dogName_2015$count),]
male_dogName_2015 <- male_dogName_2015[1:5,]
male_dogName_2015$year <- "2015"
male_dogName_2016 <- male_dogname[male_dogname$ExpYear == "2016",]
male_dogName_2016 <- table(male_dogName_2016$DogName)
male_dogName_2016 <- as.data.frame(male_dogName_2016)
male_dogName_2016 <- male_dogName_2016 %>% rename(count = Freq, name = Var1)
male_dogName_2016 <- male_dogName_2016[order(-male_dogName_2016$count),]
male_dogName_2016 <- male_dogName_2016[1:5,]
male_dogName_2016$year <- "2016"
male_dogName_2017 <- male_dogname[male_dogname$ExpYear == "2017",]
male_dogName_2017 <- table(male_dogName_2017$DogName)
male_dogName_2017 <- as.data.frame(male_dogName_2017)
male_dogName_2017 <- male_dogName_2017 %>% rename(count = Freq, name = Var1)
male_dogName_2017 <- male_dogName_2017[order(-male_dogName_2017$count),]
male_dogName_2017 <- male_dogName_2017[1:5,]
male_dogName_2017$year <- "2017"
```
```{r}
male_dogName_count_13_14 <- rbind(male_dogName_2013,male_dogName_2014)
male_dogName_count_15_16 <- rbind(male_dogName_2015,male_dogName_2016)
male_dogName_count<- rbind(male_dogName_count_13_14,male_dogName_count_15_16)
male_dogName_count <- rbind(male_dogName_count, male_dogName_2017)
```
```{r, fig.width=6, fig.height=4}
allGender <-ggplot(dogName_count, aes(x=year, y=count, group=name)) +
geom_line(aes(color=name))+
geom_point(aes(color=name))+ggtitle("All Gender: Top 5 Dog Names in 2013~2017") + ylab("Name Frequency")+xlab("Year") + scale_color_manual(values=c("#c9461e", "#913767", "#56B4E9","#dc54e8", "#a18e25", "#454140", "#a48dc2"))+geom_text(data = subset(dogName_count, year == "2017"), aes(label = name, colour = name, x = Inf, y = count, hjust=1.5), size=2.5)
female <-ggplot(female_dogName_count, aes(x=year, y=count, group=name)) +
geom_line(aes(color=name))+
geom_point(aes(color=name))+ggtitle("Female")+ ylab("Name Frequency")+xlab("Year") + scale_color_manual(values=c("#c9461e", "#913767", "#dc54e8","#a18e25", "#a48dc2", "#226133"))+geom_text(data = subset(female_dogName_count,year=="2017"), aes(label = name, colour = name, x = Inf, y = count, hjust=1), size=2)+ theme(legend.position='none')
male <-ggplot(male_dogName_count, aes(x=year, y=count, group=name)) +
geom_line(aes(color=name))+
geom_point(aes(color=name))+ggtitle("Male")+ ylab("Name Frequency")+xlab("Year") + scale_color_manual(values=c("#56B4E9", "#6c737a", "#b1c7a9","#454140", "#9c998e", "#9bc4cc"))+geom_text(data = subset(male_dogName_count, year == "2017"), aes(label = name, colour = name, x = Inf, y = count, hjust=1), size=2)+ theme(legend.position='none')
figure <- ggarrange(allGender, ggarrange(female, male, ncol = 2), nrow = 2)
figure
```
### Analysis
This plot not only tell us what popular female/male dog names are, it also tells us those popular names are becoming less popular day by day while new dog names are gaining popularity.
Column {data-width=550}
-------------------------------------
### Networks
```{r}
breed_count <- table(data$Breed)
breed_count <- as.data.frame(breed_count)
breed_count <- breed_count %>%
rename(
count = Freq,
breed = Var1
)
breed_count <- breed_count[order(-breed_count$count),]
breed_count$breed <- tolower(breed_count$breed)
```
```{r}
dog_characteristics <- read.csv("data/dog_breed_characteristics.csv")
```
```{r}
dog_characteristics$BreedName <- tolower(dog_characteristics$BreedName)
dog_breed_group <- dog_characteristics %>% rename(breed = BreedName)
dog_breed_group <- dog_breed_group %>% inner_join(breed_count, by="breed")
dog_breed_group <- dog_breed_group[dog_breed_group$Group1 != "",]
```
```{r, fig.width=10, fig.height=8}
set.seed(140)
relations <- data.frame(from=dog_breed_group$breed, to=dog_breed_group$Group1)
g <- graph_from_data_frame(relations, directed = FALSE)
V(g)[1:107]$color = "#E2E4F6"
V(g)[108:114]$color <- "#E7C8DD"
V(g)$label.color = "#2a353d"
V(g)[108:114]$label.color = "black"
V(g)$label.cex = 0.8
V(g)[108:114]$label.cex = 1.5
E(g)$width = 1
co <- layout.auto(g)
par(mar=c(0, 0, 1, 0))
plot(g, vertex.size=13, main = "The Groupings of 100+ Dog Breeds")
```
### Analysis
From this plot we can tell that dogs are generally grouped into 7 groups, and we can tell the group of some dog breeds from their name, like "rat terrier" and "welsh terrier" belong to the TERRIER group.
### Geospatial
```{r}
data$OwnerZip <- as.character(data$OwnerZip)
data_2013 <- data[data$ExpYear=="2013",]
data_2014 <- data[data$ExpYear=="2014",]
data_2015 <- data[data$ExpYear=="2015",]
data_2016 <- data[data$ExpYear=="2016",]
data_2017 <- data[data$ExpYear=="2017",]
```
```{r}
zipcode_count_2013 <- table(data_2013$OwnerZip)
zipcode_count_2013 <- as.data.frame(zipcode_count_2013)
zipcode_count_2013 <- zipcode_count_2013 %>% rename(count = Freq,zipcode = Var1)
zipcode_count_2013 <- zipcode_count_2013[order(-zipcode_count_2013$count),]
zipcode_count_2013$zipcode <- as.character(zipcode_count_2013$zipcode)
zipcode_count_2013$year <- "2013"
zipcode_count_2014 <- table(data_2014$OwnerZip)
zipcode_count_2014 <- as.data.frame(zipcode_count_2014)
zipcode_count_2014 <- zipcode_count_2014 %>% rename(count = Freq,zipcode = Var1)
zipcode_count_2014 <- zipcode_count_2014[order(-zipcode_count_2014$count),]
zipcode_count_2014$zipcode <- as.character(zipcode_count_2014$zipcode)
zipcode_count_2014$year <- "2014"
zipcode_count_2015 <- table(data_2015$OwnerZip)
zipcode_count_2015 <- as.data.frame(zipcode_count_2015)
zipcode_count_2015 <- zipcode_count_2015 %>% rename(count = Freq,zipcode = Var1)
zipcode_count_2015 <- zipcode_count_2015[order(-zipcode_count_2015$count),]
zipcode_count_2015$zipcode <- as.character(zipcode_count_2015$zipcode)
zipcode_count_2015$year <- "2015"
zipcode_count_2016 <- table(data_2016$OwnerZip)
zipcode_count_2016 <- as.data.frame(zipcode_count_2016)
zipcode_count_2016 <- zipcode_count_2016 %>% rename(count = Freq,zipcode = Var1)
zipcode_count_2016 <- zipcode_count_2016[order(-zipcode_count_2016$count),]
zipcode_count_2016$zipcode <- as.character(zipcode_count_2016$zipcode)
zipcode_count_2016$year <- "2016"
zipcode_count_2017 <- table(data_2017$OwnerZip)
zipcode_count_2017 <- as.data.frame(zipcode_count_2017)
zipcode_count_2017 <- zipcode_count_2017 %>% rename(count = Freq,zipcode = Var1)
zipcode_count_2017 <- zipcode_count_2017[order(-zipcode_count_2017$count),]
zipcode_count_2017$zipcode <- as.character(zipcode_count_2017$zipcode)
zipcode_count_2017$year <- "2017"
```
```{r}
#use a dataset to convert zip code to FIP code, because we will use FIP codes to draw a map.
fips_zip <- read.csv("data/fips_zip_x.csv")
fips_zip$ZCTA5 <- str_pad(fips_zip$ZCTA5, 5, pad = "0")
fips_zip <- fips_zip %>% select(ZCTA5, GEOID)
fips_zip <- fips_zip %>%rename(zipcode = ZCTA5)
```
```{r}
zipcode_count_2013 <- zipcode_count_2013 %>% inner_join(fips_zip, by="zipcode")
zipcode_count_2013$GEOID <- as.character(zipcode_count_2013$GEOID)
zipcode_count_2014 <- zipcode_count_2014 %>% inner_join(fips_zip, by="zipcode")
zipcode_count_2014$GEOID <- as.character(zipcode_count_2014$GEOID)
zipcode_count_2015 <- zipcode_count_2015 %>% inner_join(fips_zip, by="zipcode")
zipcode_count_2015$GEOID <- as.character(zipcode_count_2015$GEOID)
zipcode_count_2016 <- zipcode_count_2016 %>% inner_join(fips_zip, by="zipcode")
zipcode_count_2016$GEOID <- as.character(zipcode_count_2016$GEOID)
zipcode_count_2017 <- zipcode_count_2017 %>% inner_join(fips_zip, by="zipcode")
zipcode_count_2017$GEOID <- as.character(zipcode_count_2017$GEOID)
```
```{r}
suppressWarnings(require(RColorBrewer))
geoid_count_2013 <- zipcode_count_2013 %>%group_by(GEOID,year) %>% summarise(GEOID_count = sum(count))
geoid_count_2014 <- zipcode_count_2014 %>%group_by(GEOID,year) %>% summarise(GEOID_count = sum(count))
geoid_count_2015 <- zipcode_count_2015 %>%group_by(GEOID,year) %>% summarise(GEOID_count = sum(count))
geoid_count_2016 <- zipcode_count_2016 %>%group_by(GEOID,year) %>% summarise(GEOID_count = sum(count))
geoid_count_2017 <- zipcode_count_2017 %>%group_by(GEOID,year) %>% summarise(GEOID_count = sum(count))
```
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = FALSE, include=FALSE}
us_county_read <- st_read("data/tl_2019_us_county-1/tl_2019_us_county.shp")
penn_map <- us_county_read %>% filter(STATEFP == '42')
```
```{r}
geoid_2013 <- geo_join(penn_map, geoid_count_2013, 'GEOID', 'GEOID')
geoid_2014 <- geo_join(penn_map, geoid_count_2014, 'GEOID', 'GEOID')
geoid_2015 <- geo_join(penn_map, geoid_count_2015, 'GEOID', 'GEOID')
geoid_2016 <- geo_join(penn_map, geoid_count_2016, 'GEOID', 'GEOID')
geoid_2017 <- geo_join(penn_map, geoid_count_2017, 'GEOID', 'GEOID')
```
```{r}
geoid_2013 <- geoid_2013[!is.na(geoid_2013$GEOID_count),]
geoid_2014 <- geoid_2014[!is.na(geoid_2014$GEOID_count),]
geoid_2015 <- geoid_2015[!is.na(geoid_2015$GEOID_count),]
geoid_2016 <- geoid_2016[!is.na(geoid_2016$GEOID_count),]
geoid_2017 <- geoid_2017[!is.na(geoid_2017$GEOID_count),]
```
```{r, fig.height=22, fig.width=18, warning=FALSE}
suppressWarnings(require(RColorBrewer))
#tgrob <- text_grob("",size = 16)
#title <- as_ggplot(tgrob) + theme(plot.margin = margin(0,0,0,0, "cm"))
geo_2013 <- ggplot(data=geoid_2013) + geom_sf(data = geoid_2013, aes(fill=GEOID_count)) + geom_sf_label(aes(label = NAME)) + scale_fill_gradient(low = "#f5f2f7", high = "#2c0447", limit=c(0, 30000)) + labs(fill = "Number of Dog License") + theme(plot.margin = margin(0,0,0,0, "cm")) + ggtitle("Distribution of Number of Dog License in Allegheny and its nearby counties") + theme(plot.title = element_text(size = 20, face = "bold", hjust=0.5))
geo_2014 <- ggplot(data=geoid_2014) + geom_sf(data = geoid_2014, aes(fill=GEOID_count)) + geom_sf_label(aes(label = NAME)) + scale_fill_gradient(low = "#f5f2f7", high = "#2c0447", limit=c(0, 30000)) + labs(fill = "Number of Dog License") + theme(plot.margin = margin(0,0,0,0, "cm"))
geo_2015 <- ggplot(data=geoid_2015) + geom_sf(data = geoid_2015, aes(fill=GEOID_count)) + geom_sf_label(aes(label = NAME)) + scale_fill_gradient(low = "#f5f2f7", high = "#2c0447", limit=c(0, 30000)) + labs(fill = "Number of Dog License") + theme(plot.margin = margin(0,0,0,0, "cm"))
geo_2016 <- ggplot(data=geoid_2016) + geom_sf(data = geoid_2016, aes(fill=GEOID_count)) + geom_sf_label(aes(label = NAME)) + scale_fill_gradient(low = "#f5f2f7", high = "#2c0447", limit=c(0, 30000)) + labs(fill = "Number of Dog License") + theme(plot.margin = margin(0,0,0,0, "cm"))
geo_2017 <- ggplot(data=geoid_2017) + geom_sf(data = geoid_2017, aes(fill=GEOID_count)) + geom_sf_label(aes(label = NAME)) + scale_fill_gradient(low = "#f5f2f7", high = "#2c0447", limit=c(0, 30000))+ labs(fill = "Number of Dog License") + theme(plot.margin = margin(0,0,0,0, "cm")) + labs(caption = )
ggarrange(geo_2013, geo_2014, geo_2015, geo_2016, geo_2017,
labels = c("2013", "2014", "2015", "2016", "2017"),
ncol = 1, nrow = 6, hjust = -6, vjust = 15) + theme(plot.margin = margin(0,0,0,0, "cm"))
```
### Analysis
From this map, we can tell that this dataset mainly records dog license type for the Allegheny county, some a tiny bit other nearby counties, which is not informed by the dataset description. We can also tell that the total number of dog license has a little bit decrease in 2017, which is kind of uncommon.